Signal transmission



Sept. 1, 1959 `F. vlLBlG SIGNAL TRANSMISSION 2 Sheets-Sheet l Filed May 14, 1957 INVENTOR. (77 /CH WLBIG BY .I @SZW Arroe/vrsi RINLIP Sept. l, 1959 F. VlLBlG SIGNAL TRANSMISSION Filed May 14, 1957 2 sheets-sheet 2 NMA United States Patent SIGNAL TRANSMISSION Friedrich Vilbig, Cambridge,

United4 States of Americav `asl` retary of theAir Force represented by the Sec- The invention described herein may be manufactured and used by or for the United StatesGovernment for governmental purposes'without payment to me of any royalty thereon.

This invention-relates tol methods and systems forcommunication, particularly methods and systemsfor'analyzing speech by converting-it into-electrical signalsand then causing the signals to function in such manner as to synthesize speech. As used herein', the term speech embraces allvocal utterances.

Speech contains audible frequencies up to about 16 kc. Due to economical and technical considerations, the transmission band is cut downto about 3 kc. by low-pass filtering. By thislimitation, the intelligibility is not influenced Very much, but4 it would be decreased considerably upon further limitation to lower frequencies.

To arrive at a better utilization ofl transmission lines, the speechbandshould be compressed still further without losing the necessary information content. Suchv a compression is achieved by a scan coding process at the transmitter,l andra scan decoding-oneat thereceiverside. This processre'quires a speech analyzer at the transmitter side and a speech synthesizer on the receiver side.

A consideration of instantaneous pictures of'a speech spectrum displays two typical* extremes: a harmonic line spectrum and anoisespectrum. The one Which-is produced depends upon the excitation of the speech sounds. Beyond this, the envelope of the spectrum is of importance, due to the resonant structure of the sound-source system. So we` see `thatthe speech spectrum curve contains the characteristicsof the excitationffunction'and of the `system-function. In sound excitation there is alsoa distinction betweenfvoiced sounds or vowels-and unvoiced sounds or non-vowels. Ifc a vowel` is`- produced, the air current comingout ofthe lungs will be rhythmically modulated` by the vibration ofthe larynx. The fundamental frequencies of the excited air pulses are defined as pitch frequency. It varies for different persons, differing in frequency 'and inflection. In the speech spectrum of vowels, there are a number 'of frequency lines which are `harmonics of the pitchfrequency. This is the picture of the pitch frequency. Also, the appearance of a harmonic line spectrum is characteristic for a vowel excitation.

Uttering `a nonvowel"sound1e.g;, a hissing sound, will not excite the larynx; A- hiss will be produced by the air current which is representedLby-a noise spectrum. In general, the appearanceof a noise spectrum ischaracteristic `for a non-vowel excitation.

The transmission of .a so-called' pitch-hiss signal indicates either the presence. of a voiced` sound orvowelwith the pitch frequency as fundamental inthe line spectrum or of an unvoicedzsoundiwith. anoise spectrum. Beyond the pitch-hissdiscrimination, thepitch-frequency signal is another important excitation characteristic. It indicates the frequency and thealteration' intpitch' frequency. We do not transmit the pitch frequency, but we addanv artiiicial .pitch frequencyk at the synthesizer side and control it by means of a: pitch-frequency signal. Still further, it

Mass., assignor to thel 2 is advisable to combine the pitch-frequency signal with the pitch-hiss signal, for instance, by assuming that `there isa hiss in the absence of pitch frequency. During intermission in speech, a`system-fi1nctio`nis not found for the hiss spectrum, resulting in no output.

Speechband compression, therefore, aims to miniaturize the frequency band without impairing the desired informational content. An analysis of a speech spectrum has shown a pitch-harmonicline-spectrum for vowels and a noise-like spectrum for turbulent consonants (excitation function). The speech -spectrum envelope of both types of spectra represent thesystem-function.. By a speech spectrum analysis on the `analyzer side, pitch frequency and system-function signals are derived. Both are transmitted to a synthesizer where the voriginal speech spectrum is reproduced'.

In the Friedrich- Vilbigco-pe'nding patent application, Serial No. 659,183; ledMay 14, 1957 there is described and claimed a speech communication method and system wherein the speech spectrum is analyzed and `pitch-frequency and 4system-function signals are derived. The speechband of 3`-kc. is analyzed by IDO-magnetostriction filters, each of which-has a bandwidth of 30 c.p.s. The 100 magnetostriction filters have a common input and separate outputs. The single output voltages of these lters are rectiiied'and then stored. The stored voltages are scanned by the rotating arm of a 100 Contact mercuryjet switch. and then filtered by a low-pass filter having a cut-olf frequency of approximately ZOO-cps. The envelope curve thus obtained is the system-function signal, and thiscurve is transmittedfto asynthesizer in a channel of about 200 c.p.s. The arm of another 100 Contact mercury-jet switch rotatesv synchronously with the analyzer` switch and-i distributes the envelope curve voltage to the bias inputs ofY 50 modulators. These modulators control `theliarmonics ofthe pitch-frequency signal, or of hiss spectra, inthe output: of 500'pairs (100) of magnetostriction filters. The resulting output` of the synthesizer is a frequency band Whose spectrum is similar to that of theoriginal speechband.

In the aforementioned scan speech coding system for a speechbandI of 3 kc., 100: filters and 50= modulators are utilized; If'th'e pitch'` frequency'is` 100 opts; the speechband contains 300D/:30 harmonics. These harmonics-are iilterediand: modulated by only 300i` the 100 filters and l5t ofithel SO'rnodulators. As may be readily seen, not all. the channels are utilized. When the pitch frequencychanges, some of Ttheother channels are utilized instead; But always only a fraction of the filters and modulators are utilized.

In accordance with this invention, a scan speech coding method and system is provided whereby for every pitch lfrequency, therel isutilized the same array of filters, thus requiring` 30 synthesizer iilters for 30 harmonics regardless of the pitch frequency changes.

To avoid simultaneous modulation of adjacent harmonics of the pitch frequency, SOfpairs of magnetostriction filtersV and 50 modulators were utilized by the speech synthesizer in the Vilbi'g: system of the aforementioned co-pending patent application. By dividing these into three groupings, and then combining the groups through three phase shifting networks, the bandwidth for each filter modulatorfcould be increased 'to 6() c.p.s. before4 adjacent harmonic modulation occurred. For a speechband of 3 kc. and: a pitch. frequency of 100 c.p.=s., only 30.1pairs of filters and 30 modulators are necessary. The `extra 20 in the aforementioned co-pending application ofsVilbiggare necessaryV toprovide for accompanying changeain pitch frequency-.1 With 4the present filter-modulator arrangement, a satisfactory compromise is obtained between the cross-modulation and also any ringing effects that may be present inzthe filters,

In accordance with the present invention there is provided a method and apparatus to reduce the number of synthesizer lter-modulator channels and simultaneously provide an improved cross-modulation performance.

The present invention ideally solves the` cross-modulation and ringing problems with regard to filters and modulators. Two pitch frequency pulse rates are used, with pulse repetition frequencies of F =30 kc. (assuming the pitch frequency varies between 100 and 300 c.p.s.) and F |-f=30.l to 30.3 kc. The combined line spectrum of these two pulses provides 30 frequency pairs whose frequency difference corresponds to a harmonic of the pitch frequency. Because of the wide distances between frequencies of the neighboring pairs of filter-modulators, these are readily separated by filters equal in amplitude and phase, and since the smallest bandwidth now required is larger than 300 c.p.s. (highest pitch frequency), ringing effects will be absent. Rectifying the different frequency pairs produces 30 modulated harmonics of the pitch frequency. The input to the rectifier-modulators is the voltage of the envelope curve of the system-function signal, distributed by a rotating switch.

Still further in accordance with the invention, there is provided a new system for speech synthesizing which permits optimum control of cross-modulation.

In the aforementioned co-pending patent application, the bandwidth of the synthesized speech was constant but the number of pitch harmonics changed with pitch inflection. In the present invention, the bandwidth changes in proportion to the pitch frequency but the number of pitch harmonics is constant. Fewer filters and modulators are therefore required and need only equal the prespecified number of pitch harmonics.

The various features of novelty which characterize this invention are pointed out with particularity in the claims annexed to and forming a part of this specification. For a better understanding of the invention, however, its advantages and specific objects obtained with its use, reference should be had to the accompanying drawings and descriptive matter in which is illustrated and described a preferred embodiment of the invention.

In the drawings:

Figure 1 is a block diagram of a system'for synthesis of speech; and

Figure 2 shows a circuit diagram of a bridge-type pitchhiss modulator utilized in the system of speech synthesis shown in Figure l.

The pitch-frequency and system-function signals derived in the analyzer of the aforementioned co-pending patent application of Vilbig are transmitted to a speech synthesizer shown in Figure 1. The system-function signal is shown at the input terminal 100 of cathode follower 101.

Now referring to Figure 1, the vowel spectrum containing a harmonic spectrum of the pitch frequency which is limited by the vowel envelope must be synthesized on the receiver-side. The elements for this synthesis are the voltage envelope curve 102 as a product of the aforementioned analysis and the pitch-frequency signal which also comes from the transmitter side. To get a harmonic spectrum of the pitch frequency, the pitch-frequency signal is fed into terminal 1 and it controls a pitch frequency oscillator 2 covering a range of about w=100 to 300 c.p.s. sine wave. This sine wave voltage is fed to the input of double push-pull modulator 4 by way of line 3 and double push-pull modulator 4 is also fed a carrier frequency of about :600 c.p.s. This carrier frequency 0 is generated in oscillator 5 and is fed into modulator 4 by way of line 6. The modulation products of modulator 4 are two sidebands (S2-w) and (SH-w). The advantage of double push-pull modulation (also known as ring modulation) is that in the output of modulator 4, both input frequencies, namely pitch frequency (100 to 300 c.p.s.) and the auxiliary carrier frequency (600 c.p.s.), are suppressed. The two sidebands are fed by way of line 7 to high-pass filter 8. The lower sideband (S2-w) is suppressed by high-pass filter y8 which has a cut-off frequency of 700 c.p.s. The output of iilter 8 is the upper sideband (SH-w) which is about 700 to 1000 c.p.s. This is fed to double push-pull modulator 10 by way of line 9. The pitch frequency may vary within the range 100 w 300 c.p.s., therefore, the side bands vary correspondingly in the c.p.s.

Double push-pull modulator 10 is fed simultaneously by upper sideband (VQ-l-w) and a carrier frequency :xL-35,400 c.p.s. This carrier frequency is generated in oscillator 11 and is fed to modulator 10 by way of line `12. Accordingly, we get at the output of modulator 10 two` sidebands [oa-(Q-WN and [a-l-(Q-l-WH which are fed to bandpass filter 13 by way of line 14. Filter 13 has a center frequency of 36,200 c.p.s. and a bandwidth of 200 c.p.s. Therefore, the upper sideband 36,l00 (a}.Q-lw) 36, 300 c.p.s. will be present at the output of filter 13 and will be fed to pulse generator 16 by way of line 15.

This upper sideband voltage fed to pulse generator l16 by way of line 15 will serve as a trigger and the output of pulse generator are pulses of very small duration so that the spectrum are harmonic lines of 36,100 to 36,300 c.p.s. fundamental frequency.

In a similar manner, double push-pull modulator 18, which has in common the carrier frequency t2 of double push-pull modulator 10, builds up anvupper sideband frequency of (a+-S2) :36,000 c.p.s. The lower sideband is suppressed by succeeding bandpass filter 20. In double push-pull modulator 18, the frequency to be modulated is the same 52:600 c.p.s. which was the carrier for modulator 4. Therefore, the lower andupper sidebands of the output of modulator 18 are fed to bandpass filter 20 by way of line 19. The lower sideband is suppressed in filter 20 and the upper sideband isavailable at the output of filter` 20. v

The upper sideband frequency voltage is fed to pulse generator 22 by way of line 21 and it triggers pulse generator 22 which Vgenerates a harmonic spectrum with 36,000 c.p.s. fundamental frequency. Thev spectrum available from the pulse generator 22 is stable whereas the spectrum and fundamental frequency from pulse generator 16 may vary according to the pitch of the speaker and the pitch inflection from 36,100 to 36,300 c.p.s. If both spectra from generators 16 and 22 are superimposed the result is a harmonic line spectrum containing neighboring line pairs having a distance of pitch frequency units. We may find the first pair having a distance w, the second with a distance of 2w, the third 3w, and so on up to the 30th with a distance of 30W where 3000 30w 9000 corresponding to the pitch frequency. Where the carrier frequency Q=600 c.p.s. and a=35,400 c.p.s., the pairs may be separated by 30 filters. The most critical separation is that of the 30th from the 29th pair. Here the bandwidth is 9000 c.p.s. and the'y distance about 36,000 c.p.s. which is 4 times the bandwidth. So the separation is accomplished by means of simple tuned high-Q filters so that the attenuation of the neighbored filter is at a minimum 40 db.

The output'of pulse generator 16 which is the pitchpulse signal (36,100 to 36,300 c.p.s.) is fed by way of line 24 to pitch-hiss modulator 25 which is shown in Figure 2. Now referring to Figure 2, double pole-double throw switch normally is in position 101 when a pitch-frequency signal coming from terminal 1 (corresponding to terminal 1 of Figure` l) is passed through amplifier 102 which actuates relay 103 so that switch 100 is in position 101. When a pitch-frequency signal is not present at terminal 1, the relay 103 is de-actuated and switch 100 makes contacts at points 102. When switch 100 is in position 101, V2 has a very negative potential, the bridge equilibrium of pitch-hiss bridge-type modulator 25 is destroyed and the output spectrum of modulator 25 is qualitatively the same as the input spectrum. Pulse 5 generator 16 (which is the same aszshown in Figure 1) is shown in Figure 2 to indicate that the pitch-hiss gen-V erator also receives a carrier pulse input.

Now referring to Figure l, the outputs of modulator 22 and 25 are fed by way of lines 23 and 26, respectively, into conductor 27 which feeds separating filtersi 31 to 60, each connected to its corresponding modulator-demodulator 61-90; By modulation and demodulation of a line pair, the result is a multiple of the pitchfrequency according tothe number of the filter. The amplitude of each modulator is controlled by the voltage coming out of rotating switch 28 where the position. corresponds to a certain envelope value. By connecting the modulated lines together, there is obtained the desired harmonic line,

spectrum of the vowel, which is filtered out by 6 kc. low-pass filter 29.

In the present speech synthesis, the number of harmonic lines remain the same, but with pitch inflection the dis-- tance changes and so the bandwidth. The technical advantage of this synthesis is to demand a certain number of filters and modulators. equal tothe. desired number of lines. This means only 30, and there is another benefit in respect to the filters. Theseparation is much easier as to the far distance of adjacent line pairs and there is no coupling between filters. Therefore, the difficulty of cross talk and interference are eliminated.

To synthesize a consonant, the voltage envelope curve has to contain a noise spectrum instead of a harmonic line spectrum. Similarly, as the harmonic line spectrum, the noise spectrum is summarized from the components coming from filters 31--60. In the original scan noise coding system, co-pending patent application of Vilbig Serial No. 659,183 dated May 14, 1957, each filter was excited by noise to obtain components for the noise spectrum. This is not possible in the present invention since 30 filters are of different bandwidths, starting with 300 c.p.s. for the rst, and adding 300 c.p.s. to each succeeding filter, and ending with 9000 c.p.s. for the 30th. The noise spectrum synthesis is achieved in the following way. At first, there is necessary a pitch frequency for the process. Since there is no pitch in the presence of a soundless consonant, the last pitch-frequency signal has to be kept on till the next vowel appears. This is achieved by storage device 99 which stores the output of filter 13, then when there is no pitch-frequency signal, the stored signal in device 99 actuates pulse generator 16 until the next pitch-frequency signal appears. Pitch-hiss modulator 25 modulates pitch-pulse signal (36,100 to 36,300 c.p.s.) by a 100 c.p.s bandwith noise spectrum coming from hiss generator 91 and passing through 100 c.p.s. high-pass filter 92. The carrier is suppressed by bridgetype modulator 25 (pitch-hiss modulator) which is shown in Figure 2. In Figure 2 the grids of modulating tubes V1 and V2 are brought to ground level by switch 100 because in the absence of a pitch-frequency signal the relay is deactuated and switch 100 assumes position 102. Grounding the aforementioned grid achieves a balanced condition in modulator 25, thereby suppressing the carrier frequency. In addition to suppressing the carrier, there is a number of sideband pairs created, where each pair is on both sides of the suppressed carrier frequency. To this spectrum is added the one from the output of pulse generator 22. Each line of pulse spectrum from pulse generator 22 can be considered as a carrier for one of the noise sidebands. Each of filters 31-60 and its corresponding modulator-demodulator 61--90 selects a carrier and sideband combination. The product of the demodulator is a noise band of 200 c.p.s. bandwidth where the center frequency is a multiple of the pitch frequency equal to the number of filters among the 29 others. The sum of all the products is a continuous noise band.

Referring to Figure 2, when switch 100 is in position 101 in the presence of a pitch-frequency signal, V2 has a very high negative potential and the bridge equilibrium ofthe pitch-hissmodulator is` destroyed and the output spectrum qualitatively is` the same as the input spectrum fromtpulse, generator 16, thereby permitting synthesization of avowel.` Whenf the pitch-hiss` modulator is fed by the 100. c.p.s, hiss-band and also. by pulse` generator 16, there is obtained a hiss-band spectrum. and aharrnonic line spectrum. This is` utilized for consonants, thereby synthesizing speech.`

I claim;

1. Apparatus for speech. synthesisy utilizing two signals.. derived frominputspeech, one beingv a pitch frequency signal representative of a pitch frequency, and the. other a. system-function..signal, said apparatus comprising oscillator means. generating on audio sine wave signal inv accordance. with said pitch frequency signal, first means to push-pull modulate said sine. wave signal utilizing an. audio carrierlsignal, first filtenmeans to derive, a first upper sidebandy signal from said first push-pull modulation operation, second means tok push-pull modulate.. saidfirst upper sideband signal utilizing a high frequency carrier signal, second filter means to derive a second. upper sidebandl signal from said second push-pull modulation operation,` first means to.. generate a pulse of a varying, frequency-rate in accordance with said second upper sideband signal, third means to push-pull modulate said audio sine wave signal utilizing said high frequency carrier signal, thirdfilter means to derive a third upper sideband signal resulting from said third push-pull modulation operation, second means to generate a pulse of a fixed frequency rate in accordance with said third upper sideband signal, means to combine said pulse of a varying frequency rate with said pulse of a fixed frequency rate into a harmonic line spectrum containing frequency pairs where the frequency difference corresponds to a harmonic of said pitch frequency, means to separate each of said pairs, means for separately amplitude modulating each of said separated frequency pairs, said separate amplitude modulating means being controlled by said system-function signal, means for demodulating the signals resulting from said separate amplitude modulating operations, and means to combine all of said demodulated signals to synthetically reproduce said input speech. 2. Apparatus as defined in claim 1 including means for scanning said system-function signal for the purpose of controlling said separate -amplitude modulating means.

3. Apparatus as -defined in claim 1 wherein the separate modulating means is comprised of filter means and associated modulator means for each of said frequency pairs.

4. Apparatus as defined in claim 1, including vowelresponsive means for exciting one of said pulse generating means, and combined consonant and noise-responsive means for modulating the voltage output of said pulse generating means.

5. Apparatus as defined in claim 1 including means to noise-modulate the pulse output from said varying frequency pulse generator.

6. Apparatus as defined in claim 5, including pitch-frequency signal responsive means to control the operation of said noise-modulating means.

7. Apparatus as defined in claim l including means to noise-modulate said system-function signal in the absence of said pitch frequency signal a-t the input of said speech synthesizer.

8. Apparatus as defined in claim 7 wherein said noisemodulating means is comprised of a hiss generator, means to filter the hiss signal from said hiss generator, control means to operate said first pulse generating means in the absence of said pitch frequency signal, a bridge modulator having two input terminals, the first one of said terminals normally connected to ground, the second of said terminals receiving an input signal from said first pulse generating means, said `bridge 'modulator being inoperative during the absenceof said pitch frequency signal, relay means to disconnect said rst terminal from ground and connect said first terminal to receive said filtered hiss signal in the absence of said pitch frequency signal.

9. Apparatus for speech synthesis utilizing two signals derived from input speech, one being a pitch frequency signal representative of a pitch frequency, and the other a system-function signal, said apparatus comprising an' audio sine wave generator Whose frequency varies in accordance with sai-d pitch frequency signal, first means to generate pulses -of a fixed frequency, saidrfrequency being greater than lany audio frequency, second means to generate pulses whose frequency is equal to said fixed frequency pulses plus the frequency of saidsine Wave signal from said sine Wave generator, said second pulse generating means controlled in accordance with said audio sine Wave signal, means to combine the line spectrum of said two pulses to provide frequency pair signals whose frequency difference corresponds to a harmonic of said pitch frequency, filter means to separate each of said frequency pair signals, means for separately amplitude modulating each of said filtered frequency pair signals, said separate amplitude modulating means being controlled by said system-function signal, means for demodulating the signals from saidv separate modulating operations, andmeans to combine all of said demodulated signals to synthetically reproduce said input speech.

l0. Apparatus as defined in claim 9 including means to noise-modulate said-system-function` signal in the absence of said pitch frequency signals, said noise-modulating means being comprised of a hiss generator means to filter the hiss signal from said hiss generator, control means to operate said first pulse generating means in the absence References Cited in the 'file of this patent UNITED STATES PATENTS 2,098,956 Dudley Nov. 16, 1937 2,151,091 Dudley Mar. 2l, 1939 l s snm 

